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This paper discusses several manipulations of LPC (linear predictive 
coding) parameters for providing speech encryption. Specifically, the 
paper considers temporal rearrangement or scrambling of the LPC code 
sequence, as well as the alternative of perturbing individual samples in 
the sequence by means of pseudo-random additive or multiplicative noise. 
The latter approach is believed to have greater encryption potential than 
the temporal scrambling technique, in terms of the time needed to "break 
the secrecy code." The encryption techniques are assessed on the basis of 
perceptual experiments, as well as by means of a quantitative assessment 
of speech-spectrum distortion, as given by an appropriate "distance" 
measure. 

I. INTRODUCTION 

Encryption can be an important requirement in speech communica- 
tion systems. Conventionally, encryption has largely been accomplished 
by signal manipulations in the frequency domain; for example, by 
means of spectrum inversion techniques. 1 With the increased popularity 
of digital codes for speech transmission, time-domain encryption tech- 
niques have received increased attention. Typically the time-domain 
encryption technique consists of temporal rearrangement of samples 
within a time block. For the scrambling of pcm bits in speech wave- 
form coding, a block-length that is at least a pitch period long is 
usually adequate to provide a nonspeech-like output waveform. 
Similarly, the scrambling of differential pcm and delta-modulation bits 
can also produce a nonspeech-like output waveform provided that the 
time-block is sufficiently long. For example, in a 24-kb/s speech code, 
this constraint implies approximately a block length of 64 samples for 
an adequate scrambling of the coded bits. 2 

The temporal scrambling of speech samples within millisecond- 
length blocks generally provides what may be referred to as casual 
encryption. This means that a noncasual 'eavesdropper' can break the 
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speech secrecy code by the simple expedient of running through a finite 
number of possible rearrangements of the disarranged speech samples 
that are received. Greater degrees of encryption or secrecy can be 
achieved by employing much longer speech blocks for scrambling or, 
alternatively, by subjecting individual speech samples to pseudo- 
random additive or multiplicative perturbations whose undoing is 
typically more time-consuming than a simple temporal rearrangement 
of clean digits or bits. 

The purpose of this paper is to point out that casual encryption as 
well as more formal secrecy can be achieved by appropriate manipula- 
tions of the linear predictive coding (lpc) parameters. 8 - 4 The use of 
an lpc code is by no means a necessary requirement for encryption ; it 
can be achieved in conjunction with any kind of speech digitizers, such 
as the waveform codes 5 discussed above. However, when the channel 
capacity of communication systems dictates a low-bit-rate vocoder 
instead of a generally higher-bit-rate waveform code, the lpc parameter 
manipulations discussed in this paper may provide a naturally ap- 
propriate basis for speech encryption and/or secrecy. It shall also be 
seen that an efficient encryption of the lpc parameters can be achieved 
more readily than similar techniques used to encrypt waveform codes. 
For example, an adequate block length for scrambling the lpc parame- 
ters can be as short as 6 to 8 samples, while the block length for wave- 
form scrambling is typically 16 to 64 samples. 

In this paper, Section II provides a brief description of lpc encoding 
of speech, while Section III considers the use of temporal scrambling 
and pseudo-random sample perturbations for casual and formal 
encryption in the lpc domain. Section IV describes attempts to measure 
the efficacy of the encryption techniques. These measurements in- 
volved informal perceptual experiments (the results are usually un- 
ambiguous and one-dimensional enough not to require formal sub- 
jective testing), as well as a comparison of alternative techniques in 
terms of speech-spectrum distortions that they provide. The spectrum 
distortion was assessed by an appropriate distance measurement. This 
distance approach has the advantage of being quantitative ; however, 
as discussed in Section IV, the distance criterion has to be invoked 
with caution because spectral distortion, as such, is not a definitive 
measure of speech encryption. 

II. LINEAR PREDICTION SPEECH MODEL 

The method of linear prediction has proved quite popular and suc- 
cessful for use in speech-compression systems. 3 - 6 - 7 In this method, 
speech is modeled as the output of an all-pole filter H(z) that is excited 
by a sequence of pulses separated by the pitch period for voiced sounds 
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Fig. 1 — Discrete model of speech production as employed in linear prediction, 
(a) Frequency-domain model, (b) Time-domain model. 

or pseudo-random noise for unvoiced sounds. These assumptions imply 
that within a frame of speech, the output speech sequence is given by 

V 
Sn = H a k S n -k + Ou n , 

fc=l 

where p is the number of modeled poles, u n is the appropriate input 
excitation, G is the gain of the filter, and the a k 's are the coefficients 
characterizing the filter (linear prediction coefficients). Figure 1 il- 
lustrates the frequency-domain as well as the equivalent time-domain 
model of linear prediction speech production. To account for the non- 
stationary character of the speech waveform, the parameters a k of the 
modeled filter are periodically updated during successive speech 
frames. * Generation of speech in this method requires a knowledge of 
the pitch, the filter parameters, and the gain of the filter (amplitude of 
excitation) in each speech frame. 

The lpc coefficients model the combined effects of the vocal tract, 
glottal source, and radiation load in each frame of speech. Manipula- 
tions of the lpc coefficients can seriously perturb the frequency charac- 
ter of the speech signal and, hence, destroy the linguistic information 
present in the signal. In contrast, the measurements of pitch and gain 
represent the prosodic aspects of the speech and some characteristics 



* A frame is a segment of speech thought adequate to assume stationarity of the 
speech process. Typical frame lengths employed range from 10 to 30 ms. 
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of the speaker. Manipulations of pitch and gain parameters will affect 
the prosody of the speech, but not seriously diminish the linguistic 
aspects of the waveform. In Section III, we consider several methods for 
efficiently manipulating the lpc coefficients so as to encrypt the speech 
signal. 

Since the purpose of this paper is the consideration of encryption 
techniques for low-bit-rate vocoders (2.4 kb/s or less), the manipula- 
tion schemes discussed in Section III were not performed directly on 
the lpc coefficients, but rather on more desirable alternate representa- 
tions of these coefficients. The stability of the linear-prediction filter, 
H(z), is extremely sensitive to small perturbations in the lpc coefficients 
and, thus, it is not possible to achieve low-bit-rate coding by trans- 
mitting the lpc coefficients. 6 However, by transmitting either the log 
area coefficients or the parcor coefficients, a 2.4-kb/s vocoder is readily 
achieved. 6 The log area coefficients are nonlinearly related to the lpc 
coefficients by 

where the fc.-'s are termed the parcor coefficients. 7 If we denote a\ 3) as 
the ith linear prediction coefficient for a jth-pole linear-prediction 
model, then 

ki = a?. 

The parcor coefficients have the very important property that if 

\ki\ < 1, i = 1, ••-,?, 

then it is guaranteed that the linear prediction filter is stable. 4 Thus, 
small perturbations in the parcor coefficients or log-area coefficients 
will not affect the stability of the modeled filter. 

III. ENCRYPTION TECHNIQUES 
3.1 Temporal scrambling 

The rearrangement of samples within a block of length L is achieved 
by assigning to each sample a new address A (A = 1, or 2, or 3, • • •, 
or L) as determined by the state of a maximal-length shift-register 
arrangement. The theory and design of maximal length sequences is 
well documented. 8 - 9 Here, we simply provide a constructive recapitula- 
tion for the purpose of this paper. The idea is to start with a shift 
register whose length is D = log 2 L (assume that the block length is 
a power of 2, and that elements in the register are either 1 or 0). The 
next step is to select a so-called primitive polynomial Pd(x) of degree 
D, and to include stage (D - S) in the register (S = to D - 1) in 
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an exclusive or (modulo 2 add) feedback arrangement, if the coefficient 
of x s in P(x) is nonzero. The resulting network now generates a suc- 
cession of 2 D — 1 = L — 1 nonzero states in the shift register at suc- 
cessive 'clock' times, after which the cycle repeats, starting once again 
with the original initial state of the shift register. The number of 
nonzero states in the cycle is identically equal to the repetition period 
L — 1 of the cycle. Consequently, the L — 1 states of the shift- 
register (specifically, their decimal equivalents) can be utilized as 
"pseudo-random" addresses for a block of L — 1 input samples in a 
one-to-one mapping of addresses. If the input block has L rather than 
L — 1 samples (because of the frequent requirement that L be a power 
of 2), the address of the Lth sample is usually left unaltered by the 
scrambler. Such simplicity is not, however, inevitable, and appropriate 
manipulations that scramble all L samples are quite conceivable. 

Figure 2 illustrates the scrambler design for the example of D = 3 
and L = 7, as defined by a primitive polynomial Ps(X) = X s 
+ X s + 1. It is seen how input samples (1, 2, 3, 4, 5, 6, and 7) get 
scrambled into the pseudo-random positions (1, 4, 6, 7, 3, 5, and 2) in 
a reversible one-to-one mapping. 

Figure 3 illustrates an alternative design, as defined by a second 
primitive polynomial of degree 3, P 3 (X) = X 3 + X + 1. In this case, 
the output addresses of the input samples are the positions (1, 4, 2, 5, 
6, 7, and 3). 

It is clear that in each of the arrangements in Figs. 2 and 3, the use 
of a different initializing sequence (other than 001) can lead to a 
totally different mapping of sample addresses. There would be L — 1 
nonzero initializations, corresponding to every given Pa(X). Inciden- 
tally, the number of primitive polynomials of degree 3 is 2. 
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Fig. 2 — Scrambler design with a three-stage shift register. 
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Fig. 3 — Alternate scrambler design with a three-stage shift register. 

Table I lists for D = 1 to 12 a typical set of primitive polynomials 
and also the number of primitive polynomials for each D. Note, for 
example, that a 12-stage shift register with an exclusive or feedback 
network involving stages 12, 11, 8, and 6 provides one of 144 possible 
bases for a scrambler that would operate on an input block of 2 12 = 4096 
samples. 

The possibility of alternate scrambler designs (as denned by differ- 
ent initializations and/or different primitive polynomials) is an im- 
portant consideration from the point of view of the average descram- 
bling time needed for an eavesdropping code-breaker. 

3.2 LPC parameter scrambling 

The effectiveness of any scrambling scheme in perturbing the se- 
quence of samples is directly proportional to the lack of similarity or 

Table I — List of primitive polynomials 
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dynamic ranges of the samples to be scrambled. The greater the range 
of values assumed by the samples, the more effective the scrambling 
scheme. 1 For an efficient scrambling of the lpc parameters, let us begin 
by ordering the parameters in the following manner : the first sample 
in the first block is Xu, where x,„ denotes the ith lpc parameter* in 
the nth analysis frame. The second sample is £21 and the third sample 
is Xn. The arrangement proceeds in this fashion until the (p + l)th + 
sample, which is defined as x i2 . Thus, the ordering of lpc parameters 
for purposes of scrambling is simply a concatenation of the p lpc 
parameters in each sequential analysis frame. 

Using this particular arrangement, it can be seen that within a block 
of data there is a wide distribution of values assumed by the various 
samples. This observation follows from the fact that the measured 
lpc parameters for any given analysis frame will usually vary across 
the entire permissible range of values. For example, the p measured 
values of the parcor coefficients in any given frame will typically be 
somewhat uniformly spread across the permissible range of —1 to l. 4 
The particular arrangement of lpc parameters given above will thus 
be effective for scrambling purposes due to the large resulting dynamic 
range. In Section IV, we show that a block length as small as eight 
samples (L = 8) is sufficient to destroy the linguistic information in the 
synthetic signal produced by a 12th order analysis (p = 12). 

3.3 Pseudo-random perturbations 

For a more secure secrecy coding of the speech signal, the lpc pa- 
rameters can be modified by a pseudo-random additive or multiplica- 
tive perturbation. Since the repetitive period of any typical pseudo- 
random number generator is extremely large, the process of undoing 
or breaking the encryption is quite difficult and time-consuming. 

Since one of the goals of the present study was to perceptually assess 
the linguistic information in the synthesized speech generated by the 
encrypted lpc parameters, the pseudo-random number perturbation 
scheme was designed to retain the stability of the modified lpc filter. 
Thus, for the manipulation of the parcor coefficients, the pseudo- 
random number technique involved the transmission of the sequence of 
parameters 

Vin = Kin X ?",•„, 

where 

ki n — ith parcor coefficient in nth frame 

tin = ith pseudo-random number in nth frame; |r,-„| ^ 1. 



* The lpc parameters considered in this paper are either the log-area coefficients 
or parcor coefficients. 

* p = order of lpc analysis. 
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Since |r,„| ^ 1, \y in \ < 1 and the stability of the lpc filter is guaran- 
teed. For the modification of the log-area coefficients, the technique 
is simply to transmit 

Vin = Qin + T in . 

The stability of the resulting lpc filter is guaranteed regardless of the 
range of r in . This result follows from the fact that any real value of 
y in will lead to parcor parameters that are less than 1. 

In viewing the pseudo-random number manipulation of the lpc 
parameters, it should be noted that the spectral characteristics of the 
lpc filter are more sensitive to changes in the parcor coefficients than 
to changes in the log-area coefficients. 10 Thus, manipulation of the 
parcor coefficients is a more direct and efficient technique for perturbing 
the spectral properties of the lpc filter. For this reason the pseudo- 
random techniques discussed in this paper were applied only to the 
parcor coefficients. If pseudo-random number manipulation is to be 
applied to the log-area coefficients, the manipulation can be made most 
effective if the probability distribution of the random number generator 
is nonuniform, in order to mimic that of the log-area coefficient. 10 

For the experimental examination of the pseudo-random number 
perturbation of the parcor coefficients, the following two probability 
distributions were used for generating r in : 

(i) r in was uniformly distributed between —1 and 1, or 
(it) Un was, with equal probability, set to —1 or 1. 

The second distribution was studied because the resulting manipulation 
of the parcor coefficients is particularly easy to perform and, as we 
shall soon discuss, is effective in destroying the intelligibility of the 
encrypted speech. However, the "breaking" of the encryption coding 
using the second distribution is not difficult to achieve by using the 
available knowledge of the statistical range of the parameters. For 
example, it is well known that the first parcor coefficient is almost 
always positive. 4 Thus, a negative value of the first parcor coefficient 
indicates a manipulation of this parameter. If the listener knows that 
a +1 or -1 manipulation of the parameters is being employed, then a 
simple reversal of sign breaks the encryption. 

IV. EXPERIMENTAL STUDY 

In this section, we examine the effectiveness of the various encryption 
techniques in destroying the intelligibility of the output speech signal. 
For this purpose, an informal perceptual evaluation was conducted. 
To evaluate objectively the efficacy of the techniques, an lpc distance 
measure proposed by Itakura 11 was used to reinforce and supplement 
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the perceptual examination. Before discussing the lpc distance mea- 
sure, we emphasize that this measure may not be a definitive or com- 
plete description of encryption efficiency; but it is a good measure of 
spectral distortion, which in turn turns out to be a useful (if not ideal) 
indicator of intelligibility loss. 

4.1 Distance measure 

The lpc distance measure is defined as 

d n = In (& n Va T n /b n Vb T n ), 



where 



and 



= Original lpc coefficient vector (1, Oi, • • •, a p ) measured in 
the nth frame of the speech signal. 

= lpc coefficient vector determined after manipulation of the 
original parameters in the nth frame 

V = Lv(\i- j\n (i,j = 0,l ) ---,p), 



where v (*) are the normalized correlation coefficients that are computed 
directly from b„. 3 - 10 

The measure d„ has been very effectively applied in problems of 
speech recognition, 11 speaker recognition, 12 and variable frame-rate 
synthesis. 13 - 14 Gray and Markel 18 have recently demonstrated that the 
measure d„ is very closely related to the rms spectral distance measure. 
Sambur and Jayant 16 have also studied the significance of the measure, 
and a complete discussion of the utility of the measure for assessing 
spectral distortions can be found in their paper. For purposes of this 
paper, the important facts to appreciate about the measure d n are 

(i) The greater the value of d n , the more pronounced the spectral 
distortions of the original sound. 

(it) A value of d n = 0.9 is a "perceptually" significant boundary for 
evaluating spectral distortion. 13 

4.2 Experiment 

For the experimental study, four sentences spoken by four different 
speakers were analyzed using a 12th order (p = 12) lpc autocorrelation 
analysis for each contiguous 20-ms frame. The sentences analyzed were: 

(t) A lathe is a big tool. 

(it) May we all learn a yellow lion roar. 

(tit) Few thieves are never sent to the jug. 

(iv) It's time we rounded up that herd of Asian cattle. 
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The encryption schemes that were formally evaluated both per- 
ceptually and with the distance measure of Section 4.1 were: 

a. Scrambling 

(1) Block length = 16 

(2) Block length = 8 

b. Pseudo-random manipulation* 

(1) Uniform distribution of r in for i = 1 and r,-» = 1 for * > 1. 

(2) Uniform distribution of r,„ for i ^ 6 and r,„ = 1 for i > 6. 

(3) Uniform distribution of r in for all i (1 ^ i £ 12). 

(4) ±1 distribution of r,„ for i = 1 and r,« = 1 for i > 1. 

(5) ±1 distribution of r,„ for i ^ 6 and r,„ = 1 for i > 6. 

(6) ±1 distribution of r in for all i (1 ^ i ^ 12). 

Experiment b was performed to determine the number of parcor co- 
efficients that must be altered to effectively encrypt the signal. Since 
the parcor coefficients are approximately ordered in terms of their 
spectral sensitivity, 4 these experiments were performed by sequentially 
removing from manipulation the less sensitive parameters. 

4.3 Results 

4.3.1 Distance evaluation 

4.3.1.1 Uniform pseudo-random manipulation. Figure 4 illustrates the dis- 
tance-evaluation of the sentence "May we all learn a yellow lion roar" 
for the uniform pseudo-random number manipulation of the parcor 
coefficients. Parts (a), (b), and (c) of Fig. 4 indicate, respectively, the 
results of experiments b(l), b(2), and b(3) of Section 4.2. The straight 
solid line in each part of the figure depicts the perceptually significant 
threshold for assessing spectral distortions (d = 0.9). Any frame with 
a distance larger than the threshold is perceptually different from the 
nonencrypted speech. To show just how dramatically the perturbation 
in the spectral character of the speech can be, Fig. 5 illustrates the 
calculated linear prediction spectrum (dotted line) for the nonen- 
crypted speech frame and the corresponding linear prediction spectrum 
(solid line) for the same frame of encrypted speech. The measured 
lpc distance between the illustrated spectra is d n = 3.0, or approxi- 
mately the average value of distance for uniform pseudo-random 
manipulation of the first coefficient. From this figure, it can be ex- 
pected that the character of the encrypted speech is completely differ- 
ent from that of the original speech. 

* Remember r,-„ denotes the pseudo-random number multiplicative factor for the 
ith parcor coefficient in the nth frame. 
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Fig. 4 — lpc distance as a function of time across the utterance, "May we all 
learn a yellow lion roar," for uniform pseudo-random perturbation of the parcor 
parameters, (a) Manipulation of k\ ; average distance = 3.4. (b) Manipulation of 
ki to & 6 ; average distance = 4.4. (c) Manipulation of all A;,; average distance = 4.4. 

The results depicted in Fig. 4 are typical of the distance evaluation 
results for the uniform pseudo-random manipulation of the parcor 
coefficients determined for the other sentences examined. It is interest- 
ing to note that the average distance for an encryption scheme that 
manipulates the first six parameters is not significantly lower than the 
average distance obtained for the manipulation of all 12 parameters. 
This result can be anticipated from the fact that the higher-ordered 
parcor coefficients are much less sensitive than the lower-ordered pa- 
rameters, and changes in these higher-ordered parameters do not 
significantly change the spectral character of the sound. 4 Thus, a less- 
expensive and equally effective encryption scheme can be obtained 
by manipulating only a few lower-ordered parameters. To determine 
the optimum number of parameters necessary for an efficient, uniform, 
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Fig. 5 — Comparison of the distorted lpc spectra and the original lpc spectrum. 
Distance between the spectrum equals 3.0. 
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Fig. 6 — Average lpc distance as a function of number of parcor coefficients manipu- 
lated by pseudo-random number techniques. 

pseudo-random encryption, we sequentially increased the number of 
parcor parameters perturbed by uniform pseudo-random manipulation 
and measured the average lpc distance. Figure 6 illustrates the average 
lpc distance as a function of the number of parameters manipulated. 
From this figure, it can be seen that a scheme that perturbs only the 
first four parcor coefficients is quite efficient. 

4.3.1.2 Pseudo-random manipulation of +1 or —1. Figure 7 shows the de- 
tailed distance-evaluation scores for the +1 or —1 pseudo-random 
perturbation of the sentence "May we all learn a yellow lion roar." 
Parts (a), (b), and (c) of the figure correspond to experiments b(4), 
b(5), and b(6), respectively. Figure 6 illustrates the average lpc dis- 
tances obtained for encryption schemes that sequentially increase the 
number of parameters subjected to +1 or —1 pseudo-random manipu- 
lations. We note from Figs. 6 and 7 that again the perturbation of 
the higher-ordered parcor coefficients does not significantly add to the 
effectiveness of the encryption scheme. It can also be seen from these 
figures that +1 or —1 pseudo-random manipulation is generally 
superior (except for the manipulation of only ki) to the uniform 
pseudo-random number scheme in distorting the speech signal. How- 
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ever, as noted previously, this form of encryption is easier to break 
than uniform pseudo-random number coding. 

4.3.1.3 Scrambling. Figure 8 shows the frame-by-frame distance scores 
for the scrambling of the parcor coefficients for the sentence "May we 
all learn a yellow lion roar." The illustrated results are typical of the 
results obtained for the other analyzed sentences. A comparison of the 
distances results of the pseudo-random schemes (Fig. 6) shows that a 
scrambling encryption with a block length of only eight samples (L = 8) 
is at least as effective in distorting the spectral properties of the original 
signal as a pseudo-random manipulation of the first parcor coefficient. 
A scrambling scheme with a block length of 16 (L = 16) or more 
samples is superior to any of the pseudo-random schemes studied. It is 
interesting to note that the scrambling manipulation saturates in 
effectiveness for block length greater than 16. Since the range of the 
parcor coefficients is confined to — 1 £ fc< £ 1, increasing the block 
length beyond 16 does not increase the dynamic range of the sample 
within the block and, thus, the effectiveness of the scrambling is not 
enhanced f or L > p (see Section 3.2). 

4.3.2 Perceptual evaluation 

To support the results of the distance study, the various encrypted 
utterances were presented to a group of listeners for an informal 
perceptual evaluation of the manipulation schemes. To avoid any 
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Fig. 7— lpc distance as a function of time across the utterance "May we all learn 
a yellow lion roar" for the +1 or -1 pseudo-random manipulation of the parcor 
coefficient, (a) Manipulation of A:,; average distance = 2.8. (b) Manipulation of k x 
to k t ; average distance = 6.2. (c) Manipulation of all £,•; average distance = 6.d. 
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Fig. 8— -lpc distance as a function of time across the utterance "May we all learn 
a yellow lion roar" for the scrambling of the parcor coefficient, (a) Block length = 8 ; 
average distance = 3.8. (b) Block length = 16; average distance" = 7.7; (c) Block 
length = 64 ; average distance = 7.6. 

problems posed by the awkward linguistic content of the analyzed 
sentences, the listeners in this study were all familiar with the sen- 
tences, and were also informed that the utterances to be heard were 
typical sentences used to evaluate vocoder systems. 

The listeners in the experiment were asked to determine the intel- 
ligibility of the utterance and to rank-order the effectiveness of the 
encryption schemes. For all the techniques studied, except for the +1 
or — 1 manipulation of only ki, the listeners unanimously agreed that 
the encrypted utterances were clearly nonspeech-like. However, for 
the uniform pseudo-random techniques manipulating only the first 
parcor coefficients, the listeners noted that, even though the complete 
utterances could not be understood, there were certain instances in the 
encrypted utterances that were somewhat speech-like and under- 
standable. These instances probably correspond to points in the 
encrypted speech for which the lpc distances fall below the perceptual 
threshold. In characterizing the nonspeech-like quality of the en- 
crypted utterances, the listeners termed the pseudo-random perturbed 
utterances as sounding like "one continuous buzz;" the scrambled 
utterances sounded like "water running through a pipe." 

In rank-ordering the encryption schemes, the listeners were quite 
definite in characterizing the +1 or — 1 psuedo-random manipulation 
of only the first parcor coefficient as least effective. The scrambling 



1386 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1976 



with block length of 16 (d = 7.7) was ranked about equal to the +1 
or —1 pseudo-random manipulation of all 12 parcor coefficients 
(d = 6.3), and also to the same manipulation of only the first six 
coefficients (d = 6.2). The uniform pseudo-random scheme that 
altered all 12 coefficients (d = 4.4) was ranked equal to the scheme that 
perturbed only the first six coefficients (d = 4.4), and both techniques 
were ranked slightly less effective than the scrambling with block 
length of 16 (d = 7.7) and the equivalent +1 or —1 pseudo-random 
schemes. The other techniques were ranked somewhere in the middle. 
The perceptual rank-ordering of the various manipulation schemes 
corresponded almost exactly to the distance evaluation and, thus, 
reinforced the conclusions in that evaluation. 

V. CONCLUSIONS 

There is great interest in low-bit-rate speech-transmission systems 
and in the "securing" of these transmission systems. The purpose of 
this paper is to investigate various methods for encrypting a low-bit- 
rate lpc transmission system. The methods chosen for investigation 
were schemes that either scrambled the string of input parcor coeffi- 
cients or multiplied the coefficients by a pseudo-random number. The 
schemes were evaluated by an informal perceptual experiment and by 
the use of an lpc distance measure. The results of the evaluations 
suggest that all the schemes are somewhat successful in distorting the 
original signal. The most successful scheme was the scrambling tech- 
nique with a block length of 16 samples. The pseudo-random manipula- 
tions were almost as effective. 

In viewing the results of the evaluations, it is important to note that 
the distortion of the speech signal is only one consideration in designing 
an encryption system. Another consideration is the difficulty of 
"breaking" the security code. Of the codes examined, the uniform 
pseudo-random number manipulation is the most difficult to break. 
The scrambling scheme is the next most difficult and the +1 or —1 
pseudo-random scheme is the easiest, Still another consideration is the 
transmitter-end complexity of the encryption scheme. Although this 
complexity is somewhat difficult to assess, it appears that the scram- 
bling scheme is the least complex and the uniform pseudo-random ma- 
nipulation is the most complex. In choosing any of these encryp- 
tion schemes, a user would balance the various merits and liabilities of 
the techniques. 
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